Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters.
نویسنده
چکیده
A fundamental problem in reconstructing the evolutionary history of a set of species is to infer the topology of the evolutionary tree that relates those species. A statistical method for estimating such a topology from character data is called consistent if, given data from more and more characters, the method is sure to converge to the true topology. A number of popular methods are based on modeling the evolution of each character as a Markov process along the evolutionary tree. The standard models further assume that each character has in fact evolved according to the same Markov process. This homogeneity assumption is unrealistic; for example, different types of characters are known to experience substitutions at different rates. Certain distance and maximum likelihood methods for topology estimation have been shown to be consistent under the homogeneity assumption. Here we give examples showing that these methods can fail to be consistent when the homogeneity assumption is relaxed. The examples are very simple, requiring only four taxa, binary characters, and characters that evolve at two different rates.
منابع مشابه
Pitfalls of heterogeneous processes for phylogenetic reconstruction.
Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths vary (with possibly different tree shapes). Furthering work of Kolaczkowski and Thornton (2004, N...
متن کاملEfficient biased estimation of evolutionary distances when substitution rates vary across sites.
This paper deals with phylogenetic inference when the variability of substitution rates across sites (VRAS) is modeled by a gamma distribution. We show that underestimating VRAS, which results in underestimates for the evolutionary distances between sequences, usually improves the topological accuracy of phylogenetic tree inference by distance-based methods, especially when the molecular clock ...
متن کاملArtifactual phylogenies caused by correlated distribution of substitution rates among sites and lineages: the good, the bad, and the ugly.
Despite the advances in understanding molecular evolution, current phylogenetic methods barely take account of a fraction of the complexity of evolution. We are chiefly constrained by our incomplete knowledge of molecular evolutionary processes and the limits of computational power. These limitations lead to the establishment of either biologically simplistic models that rarely account for a fr...
متن کاملWhen does the incongruence length difference test fail?
This paper examines the efficiency of the incongruence length difference test (ILD) proposed by Farris et al. (1994) for assessing the incongruence between sets of characters. DNA sequences were simulated under various evolutionary conditions: (1) following symmetric or asymmetric trees, (2) with various mutation rates, (3) with constant or variable evolutionary rates along the branches, and (4...
متن کاملWhen is it safe to use an oversimplified substitution model in tree-making?
The choice of an "optimal" mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p-distance) between sequences as an estimator of evolutionary dista...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Mathematical biosciences
دوره 134 2 شماره
صفحات -
تاریخ انتشار 1996